Infrastructure for saving/loading hls4ml models #1158
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds the ability to save and load hls4ml models (serialize/deserialize them). Given a
ModelGraph
, this will serialize it in a a single file which can be loaded at a later stage. The saved model doesn't depend on the original Keras/PyTorch/ONNX model in any way.The feature is in part inspired by Keras' model saving feature. The main format used for serialization is JSON, all objects save their state in dictionaries which are serialized into JSON. Assuming disk space is not a problem, generated JSON is nicely formatted during writing to file. No objects are pickled, as this is way too unsafe. The numpy arrays (weights) are saved in npz format. We save model graph (list of layers), the model information and config into separate files. This (along with some versioning information is packaged into a
.fml
file, which is just a.tar.gz
with a different name.Internally, this works by adding a few methods to types, quantizers, layers and model graph itself. The interface is defined by the
Serializable
class. Classes would typically implementserialize_state()
method, which should return a dictionary of current state of the object. Additionally, there's also aserialize_class_name()
which is needed to know what instance are we saving, but most classes won't need to deal with this. Deserialization is done with a class methoddeserialize()
. To support this feature some restructuring had to be done.ModelGraph
has been intended to be created only with a layer list from a converter, which is not compatible with (de)serialization, so it was split into initialization of empty ModelGraph and conversion of layer list from converters toLayer
objects. Furthermore,Layer
's initialization has to be skipped, as we're basically restoring a state post-initialization. Types and quantizers are more straightforward to save/load. Loaded model should be indistinguishable from the original, but there may be some corner cases of some hacks of internal state of layers (or partially optimized models) not working on loaded models, we can catch these over time. But for "final" models (one you're happy enough with to callwrite()
/compile()
/build()
on) saving/loading should always work.One somewhat ugly part in the current implementation is that due to the creation of dynamic wrapper classes, we cannot directly deserialize to them, instead we create the original types, and have to run
<backend>:specific_types
optimizer to truly get an object that is identical to the original one. Running that optimizer for a given backend looks a bit hacky, but is ok for now since all backends have an optimizer by that name.Type of change
Tests
Included is a test in
test_serialization.py
that tests saving/loading QKeras and QONNX models. These cover serialization of most types and quantizers that can appear in a model, but obviously not all possible layers. Maybe a more thorough test would be to extend most existing tests to save and load a model and then continue working with a loaded model. But I'll leave that to a future PR.Checklist
I've done all the usual checks prior to opening this PR.